Statistical singing voice conversion based on direct waveform modification with global variance
نویسندگان
چکیده
This paper presents techniques to improve the quality of voices generated through statistical singing voice conversion with direct waveform modification based on spectrum differential (DIFFSVC). The DIFFSVC method makes it possible to convert singing voice characteristics of a source singer into those of a target singer without using vocoder-based waveform generation. However, quality of the converted singing voice still degrades compared to that of a natural singing voice due to various factors, such as the over-smoothing of the converted spectral parameter trajectory. To alleviate this over-smoothing, we propose a technique to restore the global variance of the converted spectral parameter trajectory within the framework of the DIFFSVC method. We also propose another technique to specifically avoid over-smoothing at unvoiced frames. Results of subjective and objective evaluations demonstrate that the proposed techniques significantly improve speech quality of the converted singing voice while preserving the conversion accuracy of singer identity compared to the conventional DIFFSVC.
منابع مشابه
Statistical singing voice conversion with direct waveform modification based on the spectrum differential
This paper presents a novel statistical singing voice conversion (SVC) technique with direct waveform modification based on the spectrum differential that can convert voice timbre of a source singer into that of a target singer without using a vocoder to generate converted singing voice waveforms. SVC makes it possible to convert singing voice characteristics of an arbitrary source singer into ...
متن کاملImprovements of Voice Timbre Control Based on Perceived Age in Singing Voice Conversion
As one of the techniques enabling individual singers to produce the varieties of voice timbre beyond their own physical constraints, a statistical voice timbre control technique based on the perceived age has been developed. In this technique, the perceived age of a singing voice, which is the age of the singer as perceived by the listener, is used as one of the intuitively understandable measu...
متن کاملThe NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016
This paper presents the NU-NAIST voice conversion (VC) system for the Voice Conversion Challenge 2016 (VCC 2016) developed by a joint team of Nagoya University and Nara Institute of Science and Technology. Statistical VC based on a Gaussian mixture model makes it possible to convert speaker identity of a source speaker’ voice into that of a target speaker by converting several speech parameters...
متن کاملEvaluation of a singing voice conversion method based on many-to-many eigenvoice conversion
In this paper, we evaluate our proposed singing voice conversion method from various perspectives. To enable singers to freely control their voice timbre of singing voice, we have proposed a singing voice conversion method based on many-tomany eigenvoice conversion (EVC) that enables to convert the voice timbre of an arbitrary source singer into that of another arbitrary target singer using a p...
متن کاملStatistical Voice Conversion with WaveNet-Based Waveform Generation
This paper presents a statistical voice conversion (VC) technique with the WaveNet-based waveform generation. VC based on a Gaussian mixture model (GMM) makes it possible to convert the speaker identity of a source speaker into that of a target speaker. However, in the conventional vocoding process, various factors such as F0 extraction errors, parameterization errors and over-smoothing effects...
متن کامل